Analysing massive open human mobility data using {spanishoddata}, {duckdb}, and flowmaps

July 3, 2025

Contents

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

~ 5 years of daily hourly flows

Data by Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024)

Based on 13 million customers of Orange Spain, expanded to full population of Spain

Data interface

Spanish Open Mobility Big Data

3500+ zones across Spain and beyond

Data by Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024)

Based on 13 million customers of Orange Spain, expanded to full population of Spain

Data interface

Get the data with {spanishoddata}

Get the data

Download one by one?

Get the data

Write your own XML parser?

Get the data

Time consuming options

Download one by one?

Write your own XML parser?
  • Custom code to download and import multiple days
  • Variable names in Spanish
  • No gurantee of consistent variable types
  • Limited by available memory
  • Slow data processing (raw csv data)

::::

Get the data

Time consuming options

Download one by one?

Write your own XML parser?

The fastest way

Use {spanishoddata} package

library(spanishoddata)
spod_set_data_dir("data")

od_data <- spod_get(
  type = "origin-destination",
  zones = "districts",
  dates = c(
    start = "2022-03-01",
    end = "2022-03-07"
  )
)

Get the data

The fastest way

Use {spanishoddata} package

library(spanishoddata)
spod_set_data_dir("data")

od_data <- spod_get(
  type = "origin-destination",
  zones = "districts",
  dates = c(
    start = "2022-01-01",
    end = "2022-01-04"
  )
)
library(dplyr)
glimpse(od_data)

Rows: ??
Columns: 20          
Database: DuckDB v1.2.1 [root@Darwin 24.4.0:R 4.5.0/:memory:]
$ date                        <date> 2022-01-04, 2022-01-04, 2
$ hour                        <int> 0, 0, 0, 1, 1, 3, 4, 4, 5,…
$ id_origin                   <fct> 01001, 01001, 01001, 01001
$ id_destination              <fct> 01009_AM, 01009_AM, 01009_…
$ distance                    <fct> 2-10, 2-10, 2-10, 2-10, 2-
$ activity_origin             <fct> home, frequent_activity, w…
$ activity_destination        <fct> frequent_activity, home, h…
$ study_possible_origin       <lgl> FALSE, FALSE, FALSE, FALSE…
$ study_possible_destination  <lgl> FALSE, FALSE, FALSE, FALSE…
$ residence_province_ine_code <fct> 01, 01, 01, 01, 01, 01, 01
$ residence_province_name     <fct> "Araba/Álava", "Araba/Álav…
$ income                      <fct> 10-15, >15, >15, >15, >15,…
$ age                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ sex                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ n_trips                     <dbl> 4.894, 1.779, 1.094, 1.094…
$ trips_total_length_km       <dbl> 27.966, 5.997, 4.081, 4.16…
$ year                        <int> 2022, 2022, 2022, 2022, 20…
$ month                       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day                         <int> 4, 4, 4, 4, 4, 4, 4, 4, 4,…

Big Data on a Small Laptop

DuckDB in Action

Imagine a typical laptop

DuckDB in Action

Filter and summary

library(dplyr)

od_data |>
  filter(
    year == 2022,
    month %in% c(2, 3, 4)
    ) |>
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary on full data set

library(dplyr)

od_data |>
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary over multiple groups on full data set

library(dplyr)

od_data |>
  group_by(
    year,
    month,
    day,
    id_origin,
    id_destination
  )
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary over multiple groups on full data set

library(dplyr)

od_data |>
  group_by(
    year,
    month,
    day,
    id_origin,
    id_destination
  )
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

Get in touch

Egor Kotov

ekotov.pro

References

Boyandin, Ilya. 2024. Flowmap.blue Widget for r. https://doi.org/10.32614/CRAN.package.flowmapblue.
Kotov, Egor, Robin Lovelace, and Eugeni Vidal-Tortosa. 2024. Spanishoddata. https://doi.org/10.32614/CRAN.package.spanishoddata.
Mast, Johannes. 2024. Flowmapper: Draw Flows (Migration, Goods, Money, Information) on ’Ggplot2’ Plots. https://github.com/JohMast/flowmapper.
Ministerio de Transportes y Movilidad Sostenible (MITMS). 2024. “Estudio de La Movilidad Con Big Data (Study of Mobility with Big Data).” https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
Mühleisen, Hannes, and Mark Raasveldt. 2024. Duckdb: DBI Package for the DuckDB Database Management System. https://doi.org/10.32614/CRAN.package.duckdb.
Raasveldt, Mark, and Hannes Muehleisen. 2018. DuckDB.” https://github.com/duckdb/duckdb.